Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

Do masking on whole dwords, fetch all scales at once.

pp512 results on RTX 4070:

before:
Phi-3-mini-4k-instruct-q4.gguf		4998.55
llama-3.2-3b-instruct-q5_k_m.gguf	5573.48

after:
Phi-3-mini-4k-instruct-q4.gguf		5322.34
llama-3.2-3b-instruct-q5_k_m.gguf	6082.11

Do masking on whole dwords, fetch all scales at once.
@jeffbolznv jeffbolznv requested a review from 0cc4m January 12, 2025 20:00
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 12, 2025
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and I also see a decent improvement on RTX 3090.

@0cc4m 0cc4m merged commit 466300f into ggml-org:master Jan 16, 2025
2 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
Do masking on whole dwords, fetch all scales at once.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
Do masking on whole dwords, fetch all scales at once.
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
Do masking on whole dwords, fetch all scales at once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants